Skip to content

feat: add fair async task scheduling#639

Merged
eric-tramel merged 9 commits into
mainfrom
codex/fair-async-scheduler
May 13, 2026
Merged

feat: add fair async task scheduling#639
eric-tramel merged 9 commits into
mainfrom
codex/fair-async-scheduler

Conversation

@eric-tramel
Copy link
Copy Markdown
Contributor

@eric-tramel eric-tramel commented May 13, 2026

📋 Summary

Adds a virtual-time fair admission layer to the async dataset scheduler so a large ready frontier cannot let one column or model-backed generator consume the full submission window. The existing semaphore caps and LLM-wait handoff remain in place; this changes task admission order, not public APIs.

🔗 Related Issue

N/A

🔄 Changes

  • Added FairTaskQueue, a per-group FIFO plus heap-backed virtual-time scheduler for ready async tasks.
  • Replaced greedy frontier dispatch in the main scheduler and salvage drain paths with fair group admission.
  • Moved fair-queue admission/release state into the queue interface so AsyncTaskScheduler only supplies group specs and dispatches selected tasks.
  • Added model/custom/local task group derivation with per-group admitted caps for LLM-bound work.
  • Made seed column ordering deterministic while preserving direct seed dispatch, row-group admission, and stateful-generator locks.
  • Added unit coverage for queue fairness/caps/stale pruning and integration coverage for fair column and LLM-bound admission windows.
  • Updated the dataset builder architecture note to describe fair async admission.

Implementation notes

FairTaskQueue keeps one FIFO queue per scheduling group and a global min-heap ordered by virtual finish time. Each dispatch pops one eligible group, admits one task, charges 1 / weight, and reinserts the group if more work is queued. Global _submission_semaphore and _llm_wait_semaphore remain the hard capacity controls; per-group admitted counts prevent a single LLM-bound flow from filling the LLM-wait window.

The group identity includes the logical task flow, so sibling ready columns get turns even when they share the same provider/model. Model-backed groups also include provider, model, and generation type; custom generators with model aliases group by the custom flow plus alias set because the exact model call path is user code.

📊 Benchmark proof point

A scratch benchmark compared merge-base 1d203b1d with this PR at 0973d1fe using a real DataDesigner.create() run over 384 records, local mock network endpoints, 1 warmup trial, and 3 measured trials. The workflow had a slow branch plus a fast branch gate, three fast downstream branches, a deeper review task, and a terminal record_done task. Endpoint pools were capped at 32 concurrent requests each; slow requests slept 120ms, fast requests 15ms, and terminal requests 5ms.

Under an adversarial slow-first frontier order that models the slot-hoarding failure mode:

Metric Baseline This PR
First 256 cell slot acquisitions slow_extract=256 slow_extract=128, branch_gate=128
Fast pool max/initial idle time 0.209s 0.057s
Terminal pool max/initial idle time 1.150s 0.414s
Fast pool utilization until last fast completion 79.7% 92.6%
First completed record 1.152s 0.413s
Records complete by 1.0s 0 / 384 ~207 / 384
End-to-end wall time 1.581s 1.591s

The wall-clock tail is effectively unchanged because the slow endpoint remains the true bottleneck. The proof point is resource flow: the PR keeps downstream/parallel resources active much earlier instead of letting the slow frontier monopolize the scheduling window.

References

🧪 Testing

  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/utils/test_fair_task_queue.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -q (58 passed)
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders packages/data-designer-engine/tests/engine/models -q (983 passed)
  • make check-engine
  • make check-all
  • make test (562 config + 2010 engine + 717 interface tests passed)
  • DATA_DESIGNER_ASYNC_ENGINE=1 uv run --project tests_e2e pytest tests -q (6 passed, 2 skipped for credential-gated live-provider tests)
  • DATA_DESIGNER_ASYNC_ENGINE=1 uv run --package data-designer python docs/assets/recipes/plugin_development/markdown_seed_reader.py
  • Scratch benchmark: /tmp/dd-fair-scheduler-benchmark-rvQn7u/scripts/fair_scheduler_benchmark.py against merge-base and PR commits
  • Unit tests added/updated
  • Integration tests added/updated

✅ Checklist

  • Follows commit message conventions
  • Commits are signed off (DCO)
  • Architecture docs updated (if applicable)

Description updated with AI

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel eric-tramel requested a review from a team as a code owner May 13, 2026 02:58
@eric-tramel eric-tramel self-assigned this May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

Code Review: PR #639 — feat: add fair async task scheduling

Summary

Adds a virtual-time fair admission layer (FairTaskSelector) on top of the existing async dataset scheduler. Replaces greedy frontier dispatch with per-group FIFO queues + min-heap selection, ordered by virtual finish time (1/weight). Per-group admitted-count caps prevent a single LLM-bound flow from filling the LLM-wait window. Public APIs and existing semaphore caps are unchanged; this is a scheduling-policy refactor.

Net change: +626 / -45 across the scheduler, a new fair_task_queue.py utility, fresh unit + integration tests, and a doc note in architecture/dataset-builders.md. Matches the project's "declare, don't orchestrate" contract — the new policy is encapsulated behind one helper class and the scheduler still owns admission accounting.

Findings

Correctness

  • Group accounting is consistently released. _admitted_by_group is decremented inside the finally block of _execute_task_inner_impl (async_scheduler.py:1037-1038), matching the existing _in_flight.discard / semaphore release pattern. Seed tasks in _admit_seed_tasks never enter _task_group_by_task, so the pop(task, None) returns None and the cleanup is a safe no-op for that path.
  • Potential tight loop in _drain_frontier. New control flow at async_scheduler.py:625-628:
    if ready and not dispatch_outcome.dispatched and not self._in_flight:
        continue
    Reaching this with ready non-empty and nothing in flight implies either a leaked submission permit or every group is group_blocked despite admitted_by_group == 0. _llm_group_admitted_limit enforces max(1, …), so the latter shouldn't fire — but if _max_llm_wait_tasks were ever set to 0 (min(0, …) == 0 followed by max(1, 0) == 1 makes this safe today, but the construction is brittle). Worth either guarding the loop with a fail-fast assert or proving impossibility with a comment.
  • sync_ready rebuilds order each pass, but does not refresh group specs for already-queued tasks. Line if task in self._queued: continue means the first observed TaskGroupSpec wins. In practice a task's spec is stable (model/provider don't change between dispatch passes), but self._group_specs[group.key] = group does still update — small inconsistency where a task's _task_groups[task] may point at one key while the latest _group_specs[key] was installed by a different task. Not a bug today; document the invariant or fold the update inside the if.
  • Stale-task pruning is correct but quadratic in pathological cases. _purge_queue_head scans from the head only; a stale task buried mid-queue stays in the deque until the head clears. Given sync_ready removes from _queued and _task_groups, a stale-deep entry will be skipped via _purge_queue_head later when popped — fine, but if a group's frontier churn is high you can accumulate dead deque entries. Probably not an issue at realistic scales.
  • Ordering after a group_blocked outcome. When pop_next re-activates blocked groups in the finally, they get a fresh _sequence via _activate_group, but _group_finish is unchanged. Good — preserves their virtual-time ordering. The _active_heap_keys set correctly prevents double-pushing.

Design / API

  • Group-spec computation per dispatch is non-trivial. _task_group_spec runs for every ready task on every pass through _dispatch_ready_tasks; it iterates aliases, calls get_model_config / get_model_provider_name, etc. With deep frontiers this could become measurable. Since the spec is a function of (column, generator, aliases) and these are immutable for the run, consider memoizing per task.column (or per id(generator)) on the scheduler. Today the work is small and likely well below dispatch cost, but the call graph touches user code via getattr / model_registry and is an obvious caching target.
  • sync_ready sorts the entire ready frontier (O(N log N) per pass). For deep frontiers this dominates; the resulting deterministic order is mostly used so peers within the same TaskGroupKey come out in (row_group, row_index) order. A cheaper alternative: skip the global sort and just sort within each group on insertion. Not blocking — the absolute cost is small relative to LLM I/O, but it's the kind of thing that grows quietly.
  • Seed columns shifted from frozenset to tuple. That's intentional (callers do set(seed_cols) where set membership is needed — _get_ready_tasks:309). Confirm _run_seeds_complete_check still iterates correctly with a tuple — the loop was already iterating over _rg_states, so it does. Also ensure no caller relies on set semantics for seed_cols (e.g., column in seed_cols); a quick grep would help.
  • Unbounded growth of _group_specs and _group_finish. Both maps accumulate entries over the scheduler's lifetime and never evict. Bounded by (#columns × #model aliases), so realistically tiny — flagging as informational.

Style / Conventions

  • Adheres to project conventions: from __future__ import annotations, modern type syntax (tuple[str, ...], dict[Task, …], int | None), absolute imports, SPDX header on the new file. Good.
  • _DispatchOutcome is a clean encapsulation; the submission_full vs group_blocked distinction is descriptive.
  • Several private static helpers (_get_model_config_for_alias, _get_model_provider_name_for_alias, _model_aliases_for_generator) live on AsyncTaskScheduler. They feel like they belong on ColumnGenerator (or a small adapter); leaving them on the scheduler couples it to generator-config conventions. Not blocking, but a follow-up worth tracking — it would shrink the scheduler and make custom-model alias-set rules more discoverable.
  • The architecture-doc update is accurate and concise; matches the implementation behavior.

Test Coverage

Coverage is solid:

  • test_fair_task_queue.py covers round-robin, weighted, stale-pruning, admit-cap, all-capped-returns-None, idempotent sync_ready — six focused unit tests on the selector itself.
  • test_async_scheduler.py adds:
    • test_scheduler_model_task_group_spec_uses_model_resource_and_flow — verifies provider/model/generation-type identity composition and weight = max_parallel_requests.
    • test_scheduler_custom_model_task_group_spec_uses_alias_set_weight — verifies alias-set composition for custom generators with multiple model_aliases (note: weight=5 = 2+3, sum across aliases).
    • test_scheduler_fair_admission_across_ready_columns — end-to-end check that within the first 4-task admission window, all 3 columns get a turn.
    • test_scheduler_fair_llm_group_cap_preserves_peer_admission — end-to-end LLM cap behavior (hot=2, peer=2 in window of 4).

Gaps worth considering:

  • No test for the salvage / drain path under fair admission. The _drain_frontier change introduced a new if ready and not dispatched and not in_flight: continue branch — a regression test that exercises a salvage scenario where dispatch is briefly group-blocked would harden the new control flow.
  • No test exercising multiple row groups with fair admission interaction. The integration tests use row_groups=[(0, 12)] and (0, 8) — single group. Multi-row-group fairness across columns is the more realistic case.
  • test_scheduler_fair_admission_across_ready_columns asserts len(set(first_window)) == 3 — that's "all three columns appear", which is a pretty loose contract. Perhaps tighten to verify each column appears within the first 3 dispatches (true round-robin).

Performance / Risk

  • Hot path. This change is on the dispatch hot path; per-pass overhead grew from a single greedy loop to: _get_ready_tasks_task_group_spec per task → sync_ready (with sort) → pop_next (heap). The PR description notes make test and full engine suites pass, but no benchmark numbers are included. Given DataDesigner's import-time discipline (make perf-import) and the central role of this scheduler, a one-line note in the PR body with before/after dispatch overhead on a representative run would help reviewers calibrate.
  • Backward compatibility. Public APIs unchanged (constructor signature gains nothing private-leaking); data_designer.interface users are unaffected. Internal _main_dispatch_loop / _drain_frontier signatures took a frozenset → tuple swap, which is an internal contract.
  • Failure modes. A bug in _task_group_spec (e.g., get_model_config raises) is caught by the broad except Exception: at line ~292 of the diff (_custom_model_group_spec(... weight=1)). Silent fallback is reasonable, but worth at least a logger.debug so this isn't invisible if a registry change breaks alias resolution.

Security

  • No new external inputs, no I/O, no string formatting that could splat secrets. The getattr(generator.config, ..., default) pattern looks safe. No SQL/shell/eval. Nothing concerning.

Verdict

Solid implementation of a well-motivated change. The fair-queue policy is reasonable, the data-structure choices (per-group deque + min-heap by virtual finish time) match the cited references, and the encapsulation of FairTaskSelector is clean. Tests cover the main happy paths well.

Recommended before merge:

  1. Add a short comment on the _drain_frontier continue branch documenting why it cannot spin (or convert to an assertion that progress was made).
  2. Consider memoizing _task_group_spec per column to avoid recomputing on every dispatch pass.
  3. Add at least one test exercising multi-row-group + fair admission to lock in cross-row-group behavior.
  4. Surface a debug-level log when _task_group_spec falls back to the custom-model path due to a get_model_config exception.

None of these are blocking; the change can land as-is and these can be follow-ups.

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel
Copy link
Copy Markdown
Contributor Author

Addressed the review action items in follow-up commit 0d8aeacb:

  • Hardened the impossible/no-progress _drain_frontier state by failing fast with a RuntimeError instead of looping when ready work exists but no admission capacity or in-flight task can make progress.
  • Memoized task group specs per generator instance for the duration of a scheduler run, so model alias/provider resolution is not repeated per cell task.
  • Added debug-level fallback logging with exception info when model alias/provider resolution fails and the scheduler falls back to a custom-model scheduling group.
  • Added regression coverage for the no-spin drain invariant, per-generator spec caching, debug fallback logging, stricter single-row-group fair admission, and multi-row-group fair admission across ready columns.

Validation run locally:

  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/utils/test_fair_task_queue.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -q -> 62 passed
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders packages/data-designer-engine/tests/engine/models -q -> 987 passed
  • make check-engine -> passed
  • make check-all -> passed
  • make test -> config 562 passed, engine 2014 passed, interface 717 passed

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR introduces FairTaskQueue, a virtual-time weighted fair scheduler (inspired by WFQ/SFQ) layered between the existing semaphore capacity controls and the frontier-driven task dispatch, ensuring no single column or model-backed generator can monopolize the submission window when the ready frontier is large.

  • Adds FairTaskQueue with per-group FIFO queues and a min-heap ordered by virtual finish time; integrates it into AsyncTaskScheduler so all non-seed frontier tasks are admitted through the queue rather than dispatched greedily.
  • Refactors CompletionTracker mutation methods to return FrontierDelta objects, replacing the previous direct-write-to-frontier side effects so the scheduler can apply incremental frontier updates through one path (_apply_frontier_delta).
  • Adds SchedulingHintResolver to derive per-generator scheduling metadata (group kind, model identity, max_parallel_requests weight) once at scheduler init, and caches TaskGroupSpec per generator instance to avoid repeated resolution during dispatch.

Confidence Score: 5/5

Safe to merge; the fair-admission layer slots cleanly between the existing semaphore caps and the frontier dispatcher without touching public APIs or data-persistence paths.

The changes are internally consistent: seed dispatch and stateful-lock ordering bypass the fair queue as before, FrontierDelta propagation is wired through a single path, the deadlock guard in _drain_frontier is logically sound, and the 58-test suite covering queue fairness, LLM-bound admission, and salvage integration gives solid confidence.

fair_task_queue.py — _has_queued_peer_group has an O(queued-tasks) linear scan that could be improved, though it does not affect correctness.

Important Files Changed

Filename Overview
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/fair_task_queue.py New virtual-time fair queue with per-group FIFO admission and heap-based dispatch; core logic is correct but _has_queued_peer_group scans all queued-task entries rather than a group-key index, giving O(queued tasks) per blocked-group admission check.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/async_scheduler.py Main scheduler refactored to route all frontier tasks through FairTaskQueue; seed bypass, stateful-lock ordering, and salvage paths are preserved; FrontierDelta integration and deadlock guard in _drain_frontier look correct.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/completion_tracker.py CompletionTracker updated to return FrontierDelta from all mutation methods; delta construction and frontier management look correct.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/scheduling_hints.py New module resolves per-generator scheduling metadata (provider, model, weight) once at scheduler init; partial-alias fallback correctly preserves accumulated weight.
packages/data-designer-engine/src/data_designer/engine/dataset_builders/dataset_builder.py on_seeds_complete callback updated to return FrontierDelta by aggregating delta tuples from per-row drop_row calls; correct and backward-compatible (None return still accepted by scheduler).
packages/data-designer-engine/tests/engine/dataset_builders/utils/test_fair_task_queue.py New unit tests covering queue fairness, per-group admission caps, stale pruning, and peer-sensitive bypass; good coverage of key edge cases.
packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py Expanded with fair-column-admission and LLM-bound-admission integration tests; error-rate-shutdown test updated to check early_shutdown flag and log output rather than internal _rg_states, which is more accurate for the 0-survivor case.

Sequence Diagram

sequenceDiagram
    participant DL as _main_dispatch_loop
    participant FQ as FairTaskQueue
    participant SS as SubmissionSemaphore
    participant W as Worker(_execute_task_inner)
    participant CT as CompletionTracker

    DL->>FQ: has_queued_tasks?
    FQ-->>DL: true
    DL->>SS: try_acquire()
    SS-->>DL: ok
    DL->>FQ: admit_next()
    FQ-->>DL: TaskSelection(task, group)
    DL->>W: spawn _execute_task(task)
    W->>CT: mark_cell_complete / mark_row_range_complete
    CT-->>W: FrontierDelta(added, removed)
    W->>FQ: enqueue(added tasks)
    W->>FQ: discard(removed tasks)
    W->>FQ: release(task)
    W->>SS: release()
    W->>DL: wake_event.set()
Loading
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
packages/data-designer-engine/src/data_designer/engine/dataset_builders/utils/fair_task_queue.py:155-156
`_has_queued_peer_group` iterates over every entry in `_task_groups`, which maps one entry per queued task. In a workload with many queued LLM-bound tasks across a small number of groups, this scan is O(total queued tasks) rather than O(distinct groups). Maintaining a separate `Counter` or `set` of group keys with at least one queued task would make this check O(1).

```suggestion
    def _has_queued_peer_group(self, key: TaskGroupKey) -> bool:
        return any(queued_key != key for queued_key in self._queued_group_keys)
```

Reviews (5): Last reviewed commit: "Merge branch 'main' into codex/fair-asyn..." | Re-trigger Greptile

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel
Copy link
Copy Markdown
Contributor Author

Pushed the structural correction in bb26e70e to move this PR from full-frontier resyncing to an incremental virtual-time scheduler frontier.

What changed:

  • Replaced scheduler use of sync_ready(full_ready_frontier) with persistent fair-queue enqueue/discard APIs.
  • Kept the virtual-time heap policy: per-group FIFO queues, weighted virtual finish time, and existing group admission caps.
  • Added FrontierDelta from CompletionTracker mutations so newly-ready and no-longer-ready tasks flow incrementally into the scheduler.
  • Added pre-batch pending-ready handling so downstream tasks wait until the row group pre-batch callback completes, and drops during that callback remove pending tasks before dispatch.
  • Changed salvage retry of non-seed tasks to re-enqueue the specific deferred task directly instead of rediscovering it through a global frontier scan.
  • Added regression coverage that the async scheduler no longer calls get_ready_tasks during normal dispatch, plus delta, discard, pre-batch-drop, and incremental queue tests.

Local validation:

  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/utils/test_fair_task_queue.py packages/data-designer-engine/tests/engine/dataset_builders/utils/test_completion_tracker.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -q -> 95 passed
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders packages/data-designer-engine/tests/engine/models -q -> 992 passed
  • make check-engine -> passed
  • make check-all -> passed
  • make test -> config 562 passed, engine 2019 passed, interface 717 passed

This aligns the implementation with the scratch benchmark result: keep the fairness policy, but make readiness incremental so scheduler cost scales with work movement rather than repeatedly sorting/syncing the whole backlog.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel
Copy link
Copy Markdown
Contributor Author

Dead-code cleanup pushed in 13be1c20.

Removed:

  • Unused FairTaskSelector.enqueue_many and discard_many convenience helpers.
  • Test-only coverage for those removed helpers, with remaining tests using direct enqueue calls.
  • Obsolete _drain_frontier(..., all_columns) parameter and stale get_ready_tasks wording in the seed dispatch comment.

Validation:

  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/utils/test_fair_task_queue.py packages/data-designer-engine/tests/engine/dataset_builders/utils/test_completion_tracker.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -q -> 94 passed
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders packages/data-designer-engine/tests/engine/models -q -> 991 passed
  • make check-engine -> passed
  • make check-all -> passed
  • make test -> passed

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel
Copy link
Copy Markdown
Contributor Author

Refactor pushed in 0973d1fe: moved queue-policy admission state into FairTaskQueue.

Implementation shape:

  • Renamed FairTaskSelector to FairTaskQueue because it now owns both virtual-time selection and per-group admission state.
  • Replaced scheduler-side pop_next(self._can_admit_group) with admit_next().
  • Replaced scheduler-side _admitted_by_group / _task_group_by_task cleanup with FairTaskQueue.release(task).
  • Kept task group construction in the scheduler for now; this patch only moves queue/admission policy, not generator/model grouping policy.

LOC effect:

  • async_scheduler.py: 13 insertions, 24 deletions, net -11 lines.
  • fair_task_queue.py: 27 insertions, 8 deletions, now owns admission/release.

Validation:

  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders/utils/test_fair_task_queue.py packages/data-designer-engine/tests/engine/dataset_builders/test_async_scheduler.py -q -> 66 passed
  • uv run --group dev pytest packages/data-designer-engine/tests/engine/dataset_builders packages/data-designer-engine/tests/engine/models -q -> 992 passed
  • make check-engine -> passed
  • make check-all -> passed
  • make test -> passed

@andreatgretel
Copy link
Copy Markdown
Contributor

Not blocking for this PR, but I wonder if the fairness criteria should eventually incorporate observed runtime stats rather than only static configured parallelism. Right now the weight is based on max_parallel_requests, so fairness is roughly "dispatch-count weighted by configured capacity." That seems like a good first approximation, but two model groups with the same parallelism can have very different latency/throughput in practice.

Could we make the scheduler track an EWMA of per-group runtime, queue wait, or completed-throughput and adjust the effective cost/weight over time? For example, virtual finish could charge longer-running groups more wall-clock cost, or effective weight could become something like configured capacity divided by observed runtime. That might make fairness closer to actual provider throughput, especially when one model/provider is much slower or degraded. Happy to discuss as a follow-up mechanism after the liveness fix.

@andreatgretel
Copy link
Copy Markdown
Contributor

Small testing-standards nit: a few new tests exercise private scheduler internals directly (_drain_frontier, _apply_tracker_delta, _task_group_spec, _rg_states). I get why - these paths are hard to isolate cleanly - but DEVELOPMENT.md asks us to prefer public behavior tests. Could we move any of these to public scheduler.run() coverage, or leave a short rationale where the private helper test is intentional?

Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel
Copy link
Copy Markdown
Contributor Author

Addressed the testing-standards nit in 4a6a1611.

  • Converted the early-shutdown cleanup assertion away from direct _rg_states inspection to public behavior: scheduler.early_shutdown, buffered record count, and absence of unfinished-row-group error logs.
  • Kept the remaining private-helper tests where they exercise states or metadata contracts that public run() cannot expose cleanly, and added short rationale docstrings beside those tests.

Validation after merging the worker fixes:

  • make check-engine passed
  • focused scheduler/tracker/builder tests: 94 passed
  • broader engine dataset-builder/model suite: 998 passed

@eric-tramel eric-tramel merged commit 0fdea84 into main May 13, 2026
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants